In [27]: runfile('/Users/jiaxichen/Desktop/Final Project for Data Science/Master/main_classification.py', wdir='/Users/jiaxichen/Desktop/Final Project for Data Science/Master')

Reloaded modules: preprocessing, functions

None


=======================================================================

Support Vector Machine-----------------------------------------------------------------------

=======================================================================

SVM Classifier accuracy for TFIDF: 65.500%

cross-validation accuracy scores TFIDF SVM: [0.67 0.67 0.69875 0.65875 0.6525 0.6575 0.695 0.675 0.6775

0.67375]

cross-validation accuracy: 0.673 +/- 0.014

Learning Curve for SVM TFIDF


Accuracy Plot fot SVM TFIDF


------------Error Evaluation for SVM-------------

Error Evaluation for SVM TFIDF

precision recall f1-score support


0:italian 0.70 0.69 0.70 400

1:mexican 0.44 0.57 0.50 400

2:southern_us 0.76 0.72 0.74 400

3:indian 0.67 0.73 0.70 400

4:chinese 0.68 0.68 0.68 400

5:french 0.36 0.53 0.43 400

6:cajun_creole 0.77 0.74 0.75 400

7:thai 0.75 0.80 0.78 400

8:japanese 0.63 0.60 0.61 400

9:greek 0.68 0.64 0.66 400

10:spanish 0.85 0.76 0.80 400

11:korean 0.79 0.62 0.70 400

12:vietnamese 0.84 0.81 0.83 400

13:moroccan 0.80 0.78 0.79 400

14:british 0.84 0.80 0.82 400

15:filipino 0.60 0.62 0.61 400

16:irish 0.58 0.56 0.57 400

17:jamaican 0.57 0.55 0.56 400

18:russian 0.73 0.65 0.68 400

19:brazilian 0.68 0.60 0.64 400


micro avg 0.67 0.67 0.67 8000

macro avg 0.69 0.67 0.68 8000

weighted avg 0.69 0.67 0.68 8000


Graphs - SVM TFIDF

Confusion matrix, without normalization

Normalized confusion matrix

<Figure size 432x288 with 0 Axes>


<Figure size 432x288 with 0 Axes>


=======================================================================

k-nearest neighbors--------------------------------------------------------------------

=======================================================================

KNN Classifier accuracy for TFIDF: 61.750%

cross-validation accuracy scores TFIDF KNN: [0.62125 0.6175 0.6325 0.60375 0.59125 0.61625 0.64875 0.6075 0.61875

0.6175 ]

cross-validation accuracy: 0.618 +/- 0.015

Learning Curve for KNN TFIDF


Accuracy Plot fot KNN TFIDF


------------Error Evaluation for KNN-------------

Error Evaluation for KNN TFIDF

precision recall f1-score support


0:italian 0.50 0.69 0.58 400

1:mexican 0.44 0.56 0.50 400

2:southern_us 0.64 0.71 0.67 400

3:indian 0.61 0.67 0.64 400

4:chinese 0.56 0.61 0.58 400

5:french 0.37 0.43 0.40 400

6:cajun_creole 0.65 0.74 0.70 400

7:thai 0.74 0.75 0.74 400

8:japanese 0.47 0.59 0.52 400

9:greek 0.67 0.56 0.61 400

10:spanish 0.73 0.73 0.73 400

11:korean 0.75 0.61 0.67 400

12:vietnamese 0.69 0.81 0.75 400

13:moroccan 0.78 0.76 0.77 400

14:british 0.73 0.73 0.73 400

15:filipino 0.61 0.51 0.55 400

16:irish 0.57 0.39 0.47 400

17:jamaican 0.63 0.37 0.47 400

18:russian 0.72 0.59 0.65 400

19:brazilian 0.69 0.54 0.61 400


micro avg 0.62 0.62 0.62 8000

macro avg 0.63 0.62 0.62 8000

weighted avg 0.63 0.62 0.62 8000


Graphs - KNN TFIDF

Confusion matrix, without normalization

Normalized confusion matrix

<Figure size 432x288 with 0 Axes>


<Figure size 432x288 with 0 Axes>


=======================================================================

Decision Tree--------------------------------------------------------------------

=======================================================================

Decision Tree Classifier accuracy for TFIDF: 43.938%

cross-validation accuracy scores TFIDF Decision Tree: [0.44625 0.41 0.425 0.46 0.42125 0.43 0.43125 0.4225 0.44

0.44625]

cross-validation accuracy: 0.433 +/- 0.014

Learning Curve for Decision Tree TFIDF


Accuracy Plot fot Decision Tree TFIDF


------------Error Evaluation for Decision Tree-------------

Error Evaluation for Decision Tree TFIDF

precision recall f1-score support


0:italian 0.53 0.49 0.51 400

1:mexican 0.28 0.31 0.30 400

2:southern_us 0.56 0.55 0.55 400

3:indian 0.43 0.45 0.44 400

4:chinese 0.39 0.41 0.40 400

5:french 0.23 0.30 0.26 400

6:cajun_creole 0.58 0.55 0.56 400

7:thai 0.57 0.54 0.56 400

8:japanese 0.35 0.37 0.36 400

9:greek 0.38 0.39 0.38 400

10:spanish 0.52 0.48 0.50 400

11:korean 0.45 0.44 0.45 400

12:vietnamese 0.56 0.53 0.54 400

13:moroccan 0.51 0.50 0.51 400

14:british 0.55 0.57 0.56 400

15:filipino 0.35 0.38 0.36 400

16:irish 0.32 0.30 0.31 400

17:jamaican 0.29 0.28 0.28 400

18:russian 0.53 0.45 0.49 400

19:brazilian 0.46 0.42 0.44 400


micro avg 0.43 0.43 0.43 8000

macro avg 0.44 0.43 0.44 8000

weighted avg 0.44 0.43 0.44 8000


Graphs - Decision Tree TFIDF

Confusion matrix, without normalization

Normalized confusion matrix

<Figure size 432x288 with 0 Axes>


<Figure size 432x288 with 0 Axes>


=======================================================================

Random Forest--------------------------------------------------------------------

=======================================================================

Random Forest Classifier accuracy for TFIDF: 41.188%

cross-validation accuracy scores TFIDF Random Forest: [0.42625 0.405 0.44875 0.40875 0.405 0.39125 0.43375 0.39625 0.4275

0.4575 ]

cross-validation accuracy: 0.420 +/- 0.021

Learning Curve for Random Forest TFIDF


Accuracy Plot fot Random Forest TFIDF


------------Error Evaluation for Random Forest-------------

Error Evaluation for Random Forest TFIDF

precision recall f1-score support


0:italian 0.36 0.30 0.33 400

1:mexican 0.13 0.49 0.21 400

2:southern_us 0.54 0.56 0.55 400

3:indian 0.48 0.50 0.49 400

4:chinese 0.49 0.22 0.30 400

5:french 0.14 0.08 0.10 400

6:cajun_creole 0.51 0.56 0.53 400

7:thai 0.62 0.62 0.62 400

8:japanese 0.29 0.25 0.27 400

9:greek 0.55 0.26 0.35 400

10:spanish 0.50 0.49 0.50 400

11:korean 0.80 0.42 0.55 400

12:vietnamese 0.52 0.73 0.61 400

13:moroccan 0.61 0.55 0.58 400

14:british 0.44 0.65 0.53 400

15:filipino 0.51 0.33 0.40 400

16:irish 0.47 0.14 0.22 400

17:jamaican 0.38 0.28 0.32 400

18:russian 0.48 0.48 0.48 400

19:brazilian 0.51 0.48 0.50 400


micro avg 0.42 0.42 0.42 8000

macro avg 0.47 0.42 0.42 8000

weighted avg 0.47 0.42 0.42 8000


Graphs - Random Forest TFIDF

Confusion matrix, without normalization

Normalized confusion matrix

<Figure size 432x288 with 0 Axes>


<Figure size 432x288 with 0 Axes>



Please enter a comma seperated ingredients: salt, water, milk, oil

This is the prediction value ['japanese']


In [28]: runfile('/Users/jiaxichen/Desktop/Final Project for Data Science/Master/main_clustering.py', wdir='/Users/jiaxichen/Desktop/Final Project for Data Science/Master')

Reloaded modules: preprocessing, functions

None


Traceback (most recent call last):


File "<ipython-input-28-fa0a2050625e>", line 1, in <module>

runfile('/Users/jiaxichen/Desktop/Final Project for Data Science/Master/main_clustering.py', wdir='/Users/jiaxichen/Desktop/Final Project for Data Science/Master')


File "/Users/jiaxichen/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 704, in runfile

execfile(filename, namespace)


File "/Users/jiaxichen/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 108, in execfile

exec(compile(f.read(), filename, 'exec'), namespace)


File "/Users/jiaxichen/Desktop/Final Project for Data Science/Master/main_clustering.py", line 56, in <module>

complete_vector, x_train, x_test, y_train, y_test, df_sample = classify_preprocessing(numCuisine)


ValueError: too many values to unpack (expected 6)



In [29]:


In [29]: runfile('/Users/jiaxichen/Desktop/Final Project for Data Science/Master/main_clustering.py', wdir='/Users/jiaxichen/Desktop/Final Project for Data Science/Master')

Reloaded modules: preprocessing, functions

None


List of countries: ['Greece', 'Texas', 'Philippines', 'India', 'Jamaica', 'Spain', 'Italy', 'Mexico', 'China', 'Britian', 'Thailand', 'Vietnam', 'United States', 'Brazil', 'France', 'Japan', 'Ireland', 'Korea', 'Morocco', 'Russia']


Please select your nationality:China

cuisine ... ingredients_clean_string

1646 chinese ... fresh ginger , rolls , white vinegar , dry she...

1708 chinese ... spices , spinach leaves , long-grain rice , wa...

1679 chinese ... red chili peppers , sliced shallots , fish sau...

1684 chinese ... sugar , sesame oil , scallions , black vinegar...

1752 chinese ... ground ginger , reduced sodium soy sauce , bab...

1885 chinese ... soy sauce , sesame oil , carrots , lettuce , c...

1874 chinese ... jumbo shrimp , fresh ginger , bell pepper , ch...

1995 chinese ... jasmine rice , garlic , carrots , shallots , s...

1691 chinese ... soy sauce , scallions , garlic , pork loin cho...

1987 chinese ... water , sesame oil , rice vinegar , hoisin sau...


[10 rows x 4 columns]

------------------------------------

Here comes the good stuff



/Users/jiaxichen/anaconda3/lib/python3.7/site-packages/matplotlib/figure.py:98: MatplotlibDeprecationWarning:



Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.




Enter the id of the reciepe you like: 1684

This is the closest cuisine to your selection: korean

recipe_id ... similarity_score

0 19747 ... 24.899

1 38856 ... 25.353

2 42964 ... 38.275


[3 rows x 4 columns]


This is the Kappa Score: -0.03128381278403536

This is the Silhouette score: 0.008061962380614962





















This is the Rand index: 0.05922046692167975

Homogeneity Score: 0.20900771084265996

Completeness Score : 0.22929528779329858

V_measure: 0.2186819773436727

Adjusted Random Score: 0.05922046692167975

Adjusted Mutual Info Score: 0.2013478993524601

/Users/jiaxichen/anaconda3/lib/python3.7/site-packages/sklearn/metrics/cluster/supervised.py:732: FutureWarning:


The behavior of AMI will change in version 0.22. To match the behavior of 'v_measure_score', AMI will use average_method='arithmetic' by default.



In [30]: